Automatic Annotation Techniques for Supervised and Semi-supervised Query-focused Summarization
نویسنده
چکیده
In this paper, we study one semi-supervised and several supervised methods for extractive query-focused multi-document summarization. Traditional approaches to multidocument summarization are either unsupervised or supervised. The unsupervised approaches use heuristic rules to select the most important sentences, which are hard to generalize. On the other hand, huge amount of annotated data is a prerequisite for supervised training, the availability of which is very rare for a very new research problem like query-focused summarization. However, the availability of the abstract summaries from different evaluation framework allows us to experiment with the semi-supervised approach and the sentence alignment methods to annotate the document sentences automatically. We employ five different automatic annotation techniques to build the extracts from the human abstracts. We use TF*IDF1 based cosine similarity, Extended String Subsequence Kernel (ESSK), Basic Element (BE) overlap, Syntactic Similarity, and Semantic Similarity measures as the annotation methods. Based on these annotations, we experiment with: a) two supervised multi-class classifiers; Support Vector Machines (SVM) and Logistic Regression (LR), b) three regression models; SVM, Bagging and Gaussian Processes (GP), and c) one sequence labeler; Conditional Random Fields (CRF). Our initial results of SVM classifier based on a very small subset of DUC-2006 and DUC-2007 data show the effectiveness of our approaches. TF=Term Frequency, IDF=Inverse Document Frequency
منابع مشابه
Query-focused Multi-Document Summarization: Combining a Topic Model with Graph-based Semi-supervised Learning
Graph-based learning algorithms have been shown to be an effective approach for query-focused multi-document summarization (MDS). In this paper, we extend the standard graph ranking algorithm by proposing a two-layer (i.e. sentence layer and topic layer) graph-based semi-supervised learning approach based on topic modeling techniques. Experimental results on TAC datasets show that by considerin...
متن کاملQuery-Focused Multi-Document Summarization Using Co-Training Based Semi-Supervised Learning
This paper presents a novel approach to query-focused multi-document summarization. As a good biased summary is expected to keep a balance among query relevance, content salience and information diversity, the approach first makes use of both the content feature and the relationship feature to select a number of sentences via the cotraining based semi-supervised learning, which can identify the...
متن کاملSemi-supervised extractive speech summarization via co-training algorithm
Supervised methods for extractive speech summarization require a large training set. Summary annotation is often expensive and time consuming. In this paper, we exploit semi-supervised approaches to leverage unlabeled data. In particular, we investigate co-training for the task of extractive meeting summarization. Compared with text summarization, speech summarization task has its unique charac...
متن کاملFeature expansion for query-focused supervised sentence ranking
We present a supervised sentence ranking approach for use in extractive summarization. Using a general machine learning technique provides great flexibility for incorporating varied new features, which we demonstrate. The system proves quite effective at query-focused multi-document summarization, both for single summaries and for series of update summaries.
متن کاملA Novel Feature-based Bayesian Model for Query Focused Multi-document Summarization
Supervised learning methods and LDA based topic model have been successfully applied in the field of multi-document summarization. In this paper, we propose a novel supervised approach that can incorporate rich sentence features into Bayesian topic models in a principled way, thus taking advantages of both topic model and feature based supervised learning methods. Experimental results on DUC200...
متن کامل